Dataset statistics
| Number of variables | 25 |
|---|---|
| Number of observations | 10053 |
| Missing cells | 12454 |
| Missing cells (%) | 5.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.4 MiB |
| Average record size in memory | 151.0 B |
Variable types
| NUM | 10 |
|---|---|
| CAT | 8 |
| BOOL | 7 |
property_subtype_median_facades is highly correlated with building_property_subtype_median_facades | High correlation |
building_property_subtype_median_facades is highly correlated with property_subtype_median_facades | High correlation |
building_state_median_price is highly correlated with building_state_agg | High correlation |
building_state_agg is highly correlated with building_state_median_price | High correlation |
building_property_subtype_median_facades is highly correlated with property_subtype and 1 other fields | High correlation |
property_subtype is highly correlated with building_property_subtype_median_facades and 1 other fields | High correlation |
property_subtype_median_facades is highly correlated with property_subtype and 1 other fields | High correlation |
facades_number has 9994 (99.4%) missing values | Missing |
building_property_subtype_median_facades has 1230 (12.2%) missing values | Missing |
property_subtype_median_facades has 1230 (12.2%) missing values | Missing |
garden_area is highly skewed (γ1 = 28.71980321) | Skewed |
Unnamed: 0 has unique values | Unique |
rooms_number has 249 (2.5%) zeros | Zeros |
terrace_area has 5499 (54.7%) zeros | Zeros |
garden_area has 7911 (78.7%) zeros | Zeros |
land_surface has 5435 (54.1%) zeros | Zeros |
Reproduction
| Analysis started | 2020-11-20 11:13:25.865550 |
|---|---|
| Analysis finished | 2020-11-20 11:14:02.622503 |
| Duration | 36.76 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
| Distinct | 10053 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5897.859942 |
|---|---|
| Minimum | 0 |
| Maximum | 11287 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 78.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 678.6 |
| Q1 | 3138 |
| median | 6068 |
| Q3 | 8679 |
| 95-th percentile | 10761.4 |
| Maximum | 11287 |
| Range | 11287 |
| Interquartile range (IQR) | 5541 |
Descriptive statistics
| Standard deviation | 3219.503486 |
|---|---|
| Coefficient of variation (CV) | 0.5458765582 |
| Kurtosis | -1.172085138 |
| Mean | 5897.859942 |
| Median Absolute Deviation (MAD) | 2760 |
| Skewness | -0.09608453574 |
| Sum | 59291186 |
| Variance | 10365202.7 |
| Monotocity | Strictly increasing |
| Value | Count | Frequency (%) | |
| 2047 | 1 | < 0.1% | |
| 3355 | 1 | < 0.1% | |
| 7465 | 1 | < 0.1% | |
| 5416 | 1 | < 0.1% | |
| 9510 | 1 | < 0.1% | |
| 3363 | 1 | < 0.1% | |
| 7457 | 1 | < 0.1% | |
| 5408 | 1 | < 0.1% | |
| 9502 | 1 | < 0.1% | |
| 1306 | 1 | < 0.1% | |
| Other values (10043) | 10043 | 99.9% |
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 1 | 1 | < 0.1% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 11287 | 1 | < 0.1% | |
| 11286 | 1 | < 0.1% | |
| 11284 | 1 | < 0.1% | |
| 11283 | 1 | < 0.1% | |
| 11282 | 1 | < 0.1% |
source
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 78.5 KiB |
| 6 | |
|---|---|
| 4 | 59 |
| Value | Count | Frequency (%) | |
| 6 | 9994 | 99.4% | |
| 4 | 59 | 0.6% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
postcode
Real number (ℝ≥0)
| Distinct | 812 |
|---|---|
| Distinct (%) | 8.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4954.659206 |
|---|---|
| Minimum | 1000 |
| Maximum | 9992 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 78.5 KiB |
Quantile statistics
| Minimum | 1000 |
|---|---|
| 5-th percentile | 1040 |
| Q1 | 1490 |
| median | 4500 |
| Q3 | 8370 |
| 95-th percentile | 9420 |
| Maximum | 9992 |
| Range | 8992 |
| Interquartile range (IQR) | 6880 |
Descriptive statistics
| Standard deviation | 3187.617962 |
|---|---|
| Coefficient of variation (CV) | 0.6433576618 |
| Kurtosis | -1.624508924 |
| Mean | 4954.659206 |
| Median Absolute Deviation (MAD) | 3300 |
| Skewness | 0.09993177295 |
| Sum | 49809189 |
| Variance | 10160908.27 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 8300 | 345 | 3.4% | |
| 1000 | 298 | 3.0% | |
| 9000 | 291 | 2.9% | |
| 1180 | 286 | 2.8% | |
| 1050 | 216 | 2.1% | |
| 8400 | 171 | 1.7% | |
| 4000 | 140 | 1.4% | |
| 1200 | 122 | 1.2% | |
| 1070 | 119 | 1.2% | |
| 1420 | 118 | 1.2% | |
| Other values (802) | 7947 | 79.1% |
| Value | Count | Frequency (%) | |
| 1000 | 298 | 3.0% | |
| 1020 | 46 | 0.5% | |
| 1030 | 112 | 1.1% | |
| 1040 | 67 | 0.7% | |
| 1050 | 216 | 2.1% |
| Value | Count | Frequency (%) | |
| 9992 | 1 | < 0.1% | |
| 9991 | 1 | < 0.1% | |
| 9990 | 9 | 0.1% | |
| 9988 | 3 | < 0.1% | |
| 9981 | 1 | < 0.1% |
house_is
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.8 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) | |
| False | 5029 | 50.0% | |
| True | 5024 | 50.0% |
| Distinct | 23 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 78.5 KiB |
| APARTMENT | |
|---|---|
| HOUSE | |
| VILLA | |
| MIXED_USE_BUILDING | |
| APARTMENT_BLOCK | |
| Other values (18) |
| Value | Count | Frequency (%) | |
| APARTMENT | 3727 | 37.1% | |
| HOUSE | 3099 | 30.8% | |
| VILLA | 567 | 5.6% | |
| MIXED_USE_BUILDING | 547 | 5.4% | |
| APARTMENT_BLOCK | 454 | 4.5% | |
| DUPLEX | 367 | 3.7% | |
| PENTHOUSE | 297 | 3.0% | |
| GROUND_FLOOR | 266 | 2.6% | |
| FLAT_STUDIO | 202 | 2.0% | |
| MANSION | 113 | 1.1% | |
| Other values (13) | 414 | 4.1% |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Length
| Max length | 20 |
|---|---|
| Median length | 9 |
| Mean length | 8.368049339 |
| Min length | 3 |
price
Real number (ℝ≥0)
| Distinct | 1207 |
|---|---|
| Distinct (%) | 12.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 371368.8246 |
|---|---|
| Minimum | 25000 |
| Maximum | 1396000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 78.5 KiB |
Quantile statistics
| Minimum | 25000 |
|---|---|
| 5-th percentile | 125000 |
| Q1 | 215000 |
| median | 299000 |
| Q3 | 450000 |
| 95-th percentile | 860000 |
| Maximum | 1396000 |
| Range | 1371000 |
| Interquartile range (IQR) | 235000 |
Descriptive statistics
| Standard deviation | 236721.1506 |
|---|---|
| Coefficient of variation (CV) | 0.6374287094 |
| Kurtosis | 3.488524833 |
| Mean | 371368.8246 |
| Median Absolute Deviation (MAD) | 104000 |
| Skewness | 1.742836437 |
| Sum | 3733370794 |
| Variance | 5.603690315e+10 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 295000 | 140 | 1.4% | |
| 299000 | 135 | 1.3% | |
| 225000 | 133 | 1.3% | |
| 395000 | 126 | 1.3% | |
| 199000 | 126 | 1.3% | |
| 249000 | 120 | 1.2% | |
| 275000 | 118 | 1.2% | |
| 325000 | 103 | 1.0% | |
| 250000 | 98 | 1.0% | |
| 245000 | 98 | 1.0% | |
| Other values (1197) | 8856 | 88.1% |
| Value | Count | Frequency (%) | |
| 25000 | 1 | < 0.1% | |
| 30000 | 2 | < 0.1% | |
| 39000 | 2 | < 0.1% | |
| 40000 | 1 | < 0.1% | |
| 45000 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1396000 | 1 | < 0.1% | |
| 1395000 | 17 | 0.2% | |
| 1390000 | 7 | 0.1% | |
| 1385000 | 4 | < 0.1% | |
| 1380000 | 2 | < 0.1% |
| Distinct | 7 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.764547896 |
|---|---|
| Minimum | 0 |
| Maximum | 6 |
| Zeros | 249 |
| Zeros (%) | 2.5% |
| Memory size | 78.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 3 |
| Q3 | 3 |
| 95-th percentile | 5 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.247029294 |
|---|---|
| Coefficient of variation (CV) | 0.4510789254 |
| Kurtosis | 0.1485538856 |
| Mean | 2.764547896 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.3751681032 |
| Sum | 27792 |
| Variance | 1.55508206 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 2 | 3176 | 31.6% | |
| 3 | 3171 | 31.5% | |
| 4 | 1497 | 14.9% | |
| 1 | 1037 | 10.3% | |
| 5 | 636 | 6.3% | |
| 6 | 287 | 2.9% | |
| 0 | 249 | 2.5% |
| Value | Count | Frequency (%) | |
| 0 | 249 | 2.5% | |
| 1 | 1037 | 10.3% | |
| 2 | 3176 | 31.6% | |
| 3 | 3171 | 31.5% | |
| 4 | 1497 | 14.9% |
| Value | Count | Frequency (%) | |
| 6 | 287 | 2.9% | |
| 5 | 636 | 6.3% | |
| 4 | 1497 | 14.9% | |
| 3 | 3171 | 31.5% | |
| 2 | 3176 | 31.6% |
area
Real number (ℝ≥0)
| Distinct | 427 |
|---|---|
| Distinct (%) | 4.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 155.398488 |
|---|---|
| Minimum | 5 |
| Maximum | 470 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 78.5 KiB |
Quantile statistics
| Minimum | 5 |
|---|---|
| 5-th percentile | 56 |
| Q1 | 93 |
| median | 135 |
| Q3 | 200 |
| 95-th percentile | 330 |
| Maximum | 470 |
| Range | 465 |
| Interquartile range (IQR) | 107 |
Descriptive statistics
| Standard deviation | 84.26146565 |
|---|---|
| Coefficient of variation (CV) | 0.5422283494 |
| Kurtosis | 1.151725891 |
| Mean | 155.398488 |
| Median Absolute Deviation (MAD) | 48 |
| Skewness | 1.150419671 |
| Sum | 1562221 |
| Variance | 7099.994594 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 120 | 216 | 2.1% | |
| 100 | 208 | 2.1% | |
| 90 | 203 | 2.0% | |
| 150 | 202 | 2.0% | |
| 200 | 189 | 1.9% | |
| 110 | 170 | 1.7% | |
| 80 | 170 | 1.7% | |
| 160 | 168 | 1.7% | |
| 130 | 156 | 1.6% | |
| 140 | 154 | 1.5% | |
| Other values (417) | 8217 | 81.7% |
| Value | Count | Frequency (%) | |
| 5 | 2 | < 0.1% | |
| 15 | 2 | < 0.1% | |
| 16 | 4 | < 0.1% | |
| 17 | 4 | < 0.1% | |
| 18 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 470 | 10 | 0.1% | |
| 468 | 1 | < 0.1% | |
| 467 | 1 | < 0.1% | |
| 465 | 1 | < 0.1% | |
| 462 | 1 | < 0.1% |
equipped_kitchen_has
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.8 KiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) | |
| True | 8480 | 84.4% | |
| False | 1573 | 15.6% |
furnished
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.8 KiB |
| False | |
|---|---|
| True | 366 |
| Value | Count | Frequency (%) | |
| False | 9687 | 96.4% | |
| True | 366 | 3.6% |
open_fire
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.8 KiB |
| False | |
|---|---|
| True | 567 |
| Value | Count | Frequency (%) | |
| False | 9486 | 94.4% | |
| True | 567 | 5.6% |
terrace
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.8 KiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) | |
| True | 6713 | 66.8% | |
| False | 3340 | 33.2% |
| Distinct | 134 |
|---|---|
| Distinct (%) | 1.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11.10156172 |
|---|---|
| Minimum | 0 |
| Maximum | 708 |
| Zeros | 5499 |
| Zeros (%) | 54.7% |
| Memory size | 78.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 15 |
| 95-th percentile | 50 |
| Maximum | 708 |
| Range | 708 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 22.84805366 |
|---|---|
| Coefficient of variation (CV) | 2.058093648 |
| Kurtosis | 129.7892583 |
| Mean | 11.10156172 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 7.26852841 |
| Sum | 111604 |
| Variance | 522.0335561 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 5499 | 54.7% | |
| 20 | 274 | 2.7% | |
| 10 | 252 | 2.5% | |
| 15 | 224 | 2.2% | |
| 8 | 206 | 2.0% | |
| 30 | 197 | 2.0% | |
| 6 | 187 | 1.9% | |
| 12 | 187 | 1.9% | |
| 25 | 173 | 1.7% | |
| 4 | 154 | 1.5% | |
| Other values (124) | 2700 | 26.9% |
| Value | Count | Frequency (%) | |
| 0 | 5499 | 54.7% | |
| 1 | 27 | 0.3% | |
| 2 | 83 | 0.8% | |
| 3 | 127 | 1.3% | |
| 4 | 154 | 1.5% |
| Value | Count | Frequency (%) | |
| 708 | 1 | < 0.1% | |
| 450 | 1 | < 0.1% | |
| 400 | 1 | < 0.1% | |
| 350 | 1 | < 0.1% | |
| 330 | 1 | < 0.1% |
garden
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.8 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) | |
| False | 7911 | 78.7% | |
| True | 2142 | 21.3% |
| Distinct | 586 |
|---|---|
| Distinct (%) | 5.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 124.184721 |
|---|---|
| Minimum | 0 |
| Maximum | 40000 |
| Zeros | 7911 |
| Zeros (%) | 78.7% |
| Memory size | 78.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 613.2 |
| Maximum | 40000 |
| Range | 40000 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 860.1838449 |
|---|---|
| Coefficient of variation (CV) | 6.926647966 |
| Kurtosis | 1125.943137 |
| Mean | 124.184721 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 28.71980321 |
| Sum | 1248429 |
| Variance | 739916.2471 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 7911 | 78.7% | |
| 100 | 84 | 0.8% | |
| 50 | 57 | 0.6% | |
| 300 | 54 | 0.5% | |
| 200 | 53 | 0.5% | |
| 400 | 47 | 0.5% | |
| 500 | 44 | 0.4% | |
| 150 | 42 | 0.4% | |
| 30 | 40 | 0.4% | |
| 1 | 38 | 0.4% | |
| Other values (576) | 1683 | 16.7% |
| Value | Count | Frequency (%) | |
| 0 | 7911 | 78.7% | |
| 1 | 38 | 0.4% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 5 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 40000 | 2 | < 0.1% | |
| 29400 | 1 | < 0.1% | |
| 17800 | 1 | < 0.1% | |
| 15601 | 2 | < 0.1% | |
| 15000 | 1 | < 0.1% |
| Distinct | 1430 |
|---|---|
| Distinct (%) | 14.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 292.9476773 |
|---|---|
| Minimum | 0 |
| Maximum | 4170 |
| Zeros | 5435 |
| Zeros (%) | 54.1% |
| Memory size | 78.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 312 |
| 95-th percentile | 1500 |
| Maximum | 4170 |
| Range | 4170 |
| Interquartile range (IQR) | 312 |
Descriptive statistics
| Standard deviation | 576.0962548 |
|---|---|
| Coefficient of variation (CV) | 1.96655 |
| Kurtosis | 10.9323011 |
| Mean | 292.9476773 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.058475153 |
| Sum | 2945003 |
| Variance | 331886.8948 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 5435 | 54.1% | |
| 150 | 50 | 0.5% | |
| 100 | 49 | 0.5% | |
| 120 | 44 | 0.4% | |
| 110 | 40 | 0.4% | |
| 200 | 40 | 0.4% | |
| 70 | 38 | 0.4% | |
| 90 | 35 | 0.3% | |
| 160 | 35 | 0.3% | |
| 300 | 34 | 0.3% | |
| Other values (1420) | 4253 | 42.3% |
| Value | Count | Frequency (%) | |
| 0 | 5435 | 54.1% | |
| 1 | 8 | 0.1% | |
| 4 | 1 | < 0.1% | |
| 5 | 1 | < 0.1% | |
| 6 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 4170 | 1 | < 0.1% | |
| 4126 | 1 | < 0.1% | |
| 4100 | 1 | < 0.1% | |
| 4065 | 1 | < 0.1% | |
| 4000 | 3 | < 0.1% |
| Distinct | 4 |
|---|---|
| Distinct (%) | 6.8% |
| Missing | 9994 |
| Missing (%) | 99.4% |
| Memory size | 78.5 KiB |
| 2 | |
|---|---|
| 3 | |
| 4 | |
| 1 | 2 |
| Value | Count | Frequency (%) | |
| 2 | 27 | 0.3% | |
| 3 | 18 | 0.2% | |
| 4 | 12 | 0.1% | |
| 1 | 2 | < 0.1% | |
| (Missing) | 9994 | 99.4% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
swimming_pool_has
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.8 KiB |
| False | |
|---|---|
| True | 182 |
| Value | Count | Frequency (%) | |
| False | 9871 | 98.2% | |
| True | 182 | 1.8% |
region
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 78.5 KiB |
| F | |
|---|---|
| W | |
| B |
| Value | Count | Frequency (%) | |
| F | 5089 | 50.6% | |
| W | 3102 | 30.9% | |
| B | 1862 | 18.5% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 78.5 KiB |
| good | |
|---|---|
| to_renovate | |
| renovated | 727 |
| Value | Count | Frequency (%) | |
| good | 7581 | 75.4% | |
| to_renovate | 1745 | 17.4% | |
| renovated | 727 | 7.2% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 11 |
|---|---|
| Median length | 4 |
| Mean length | 5.576643788 |
| Min length | 4 |
postcode_median_price
Real number (ℝ≥0)
| Distinct | 402 |
|---|---|
| Distinct (%) | 4.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 335273.9037 |
|---|---|
| Minimum | 65000 |
| Maximum | 1350000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 78.5 KiB |
Quantile statistics
| Minimum | 65000 |
|---|---|
| 5-th percentile | 167500 |
| Q1 | 237500 |
| median | 298400 |
| Q3 | 399000 |
| 95-th percentile | 620000 |
| Maximum | 1350000 |
| Range | 1285000 |
| Interquartile range (IQR) | 161500 |
Descriptive statistics
| Standard deviation | 136471.0566 |
|---|---|
| Coefficient of variation (CV) | 0.4070434804 |
| Kurtosis | 0.6017389488 |
| Mean | 335273.9037 |
| Median Absolute Deviation (MAD) | 73400 |
| Skewness | 0.9867318184 |
| Sum | 3370508554 |
| Variance | 1.86243493e+10 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 649000 | 345 | 3.4% | |
| 399000 | 312 | 3.1% | |
| 379000 | 293 | 2.9% | |
| 560000 | 286 | 2.8% | |
| 225000 | 271 | 2.7% | |
| 235000 | 218 | 2.2% | |
| 601000 | 216 | 2.1% | |
| 269000 | 186 | 1.9% | |
| 325000 | 181 | 1.8% | |
| 295000 | 164 | 1.6% | |
| Other values (392) | 7581 | 75.4% |
| Value | Count | Frequency (%) | |
| 65000 | 1 | < 0.1% | |
| 70000 | 1 | < 0.1% | |
| 79500 | 1 | < 0.1% | |
| 92000 | 1 | < 0.1% | |
| 98000 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1350000 | 1 | < 0.1% | |
| 1150000 | 1 | < 0.1% | |
| 925000 | 5 | < 0.1% | |
| 845000 | 1 | < 0.1% | |
| 799000 | 1 | < 0.1% |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 78.5 KiB |
| 320000 | |
|---|---|
| 230000 | |
| 310000 | 727 |
| Value | Count | Frequency (%) | |
| 320000 | 7581 | 75.4% | |
| 230000 | 1745 | 17.4% | |
| 310000 | 727 | 7.2% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
property_subtype_median_price
Real number (ℝ≥0)
| Distinct | 22 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 316466.8756 |
|---|---|
| Minimum | 118000 |
| Maximum | 652500 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 78.5 KiB |
Quantile statistics
| Minimum | 118000 |
|---|---|
| 5-th percentile | 282500 |
| Q1 | 282500 |
| median | 288000 |
| Q3 | 310000 |
| 95-th percentile | 540000 |
| Maximum | 652500 |
| Range | 534500 |
| Interquartile range (IQR) | 27500 |
Descriptive statistics
| Standard deviation | 80490.29644 |
|---|---|
| Coefficient of variation (CV) | 0.2543403517 |
| Kurtosis | 3.923005571 |
| Mean | 316466.8756 |
| Median Absolute Deviation (MAD) | 5500 |
| Skewness | 1.960431562 |
| Sum | 3181441500 |
| Variance | 6478687820 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 282500 | 3727 | 37.1% | |
| 288000 | 3099 | 30.8% | |
| 540000 | 567 | 5.6% | |
| 310000 | 547 | 5.4% | |
| 357500 | 454 | 4.5% | |
| 325000 | 367 | 3.7% | |
| 495000 | 297 | 3.0% | |
| 315000 | 266 | 2.6% | |
| 149000 | 202 | 2.0% | |
| 525000 | 113 | 1.1% | |
| Other values (12) | 414 | 4.1% |
| Value | Count | Frequency (%) | |
| 118000 | 5 | < 0.1% | |
| 122000 | 5 | < 0.1% | |
| 149000 | 202 | 2.0% | |
| 238000 | 75 | 0.7% | |
| 282500 | 3727 | 37.1% |
| Value | Count | Frequency (%) | |
| 652500 | 62 | 0.6% | |
| 540000 | 567 | 5.6% | |
| 535000 | 1 | < 0.1% | |
| 525000 | 113 | 1.1% | |
| 495000 | 297 | 3.0% |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1230 |
| Missing (%) | 12.2% |
| Memory size | 78.5 KiB |
| 2 | |
|---|---|
| 3 | |
| 4 |
| Value | Count | Frequency (%) | |
| 2 | 4589 | 45.6% | |
| 3 | 3605 | 35.9% | |
| 4 | 629 | 6.3% | |
| (Missing) | 1230 | 12.2% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1230 |
| Missing (%) | 12.2% |
| Memory size | 78.5 KiB |
| 2 | |
|---|---|
| 3 | |
| 4 |
| Value | Count | Frequency (%) | |
| 2 | 4589 | 45.6% | |
| 3 | 3605 | 35.9% | |
| 4 | 629 | 6.3% | |
| (Missing) | 1230 | 12.2% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| Unnamed: 0 | source | postcode | house_is | property_subtype | price | rooms_number | area | equipped_kitchen_has | furnished | open_fire | terrace | terrace_area | garden | garden_area | land_surface | facades_number | swimming_pool_has | region | building_state_agg | postcode_median_price | building_state_median_price | property_subtype_median_price | building_property_subtype_median_facades | property_subtype_median_facades | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 6 | 4180 | True | MIXED_USE_BUILDING | 295000.0 | 3.0 | 242.0 | True | False | False | True | 36.0 | True | 1000.0 | 1403.0 | NaN | False | W | good | 229000.0 | 320000.0 | 310000.0 | 2.0 | 2.0 |
| 1 | 1 | 6 | 8730 | True | VILLA | 675000.0 | 4.0 | 349.0 | True | False | False | False | 0.0 | True | 977.0 | 1526.0 | NaN | False | F | good | 241000.0 | 320000.0 | 540000.0 | 4.0 | 4.0 |
| 2 | 2 | 6 | 4020 | True | APARTMENT_BLOCK | 250000.0 | 5.0 | 303.0 | True | False | False | False | 0.0 | False | 0.0 | 760.0 | NaN | False | W | to_renovate | 195000.0 | 230000.0 | 357500.0 | NaN | NaN |
| 3 | 3 | 6 | 1200 | True | HOUSE | 545000.0 | 4.0 | 235.0 | True | True | False | False | 0.0 | False | 0.0 | 63.0 | NaN | False | B | renovated | 445000.0 | 310000.0 | 288000.0 | 3.0 | 3.0 |
| 4 | 4 | 6 | 1190 | True | MIXED_USE_BUILDING | 500000.0 | 2.0 | 220.0 | True | False | False | False | 0.0 | True | 60.0 | 193.0 | NaN | False | B | good | 360000.0 | 320000.0 | 310000.0 | 2.0 | 2.0 |
| 5 | 5 | 6 | 4040 | True | HOUSE | 189000.0 | 3.0 | 200.0 | True | False | False | False | 0.0 | True | 40.0 | 100.0 | NaN | False | W | to_renovate | 229000.0 | 230000.0 | 288000.0 | 3.0 | 3.0 |
| 6 | 6 | 6 | 4540 | True | MIXED_USE_BUILDING | 465000.0 | 4.0 | 400.0 | True | False | False | False | 0.0 | False | 0.0 | 312.0 | NaN | False | W | good | 175000.0 | 320000.0 | 310000.0 | 2.0 | 2.0 |
| 7 | 7 | 6 | 1150 | True | APARTMENT_BLOCK | 650000.0 | 4.0 | 200.0 | True | False | False | True | 4.0 | True | 150.0 | 301.0 | NaN | False | B | good | 620000.0 | 320000.0 | 357500.0 | NaN | NaN |
| 8 | 8 | 6 | 6870 | True | MIXED_USE_BUILDING | 89000.0 | 3.0 | 180.0 | True | False | False | False | 0.0 | False | 0.0 | 96.0 | NaN | False | W | to_renovate | 124700.0 | 230000.0 | 310000.0 | 2.0 | 2.0 |
| 9 | 9 | 6 | 4030 | True | MIXED_USE_BUILDING | 129000.0 | 3.0 | 156.0 | True | False | False | False | 0.0 | False | 0.0 | 71.0 | NaN | False | W | to_renovate | 190000.0 | 230000.0 | 310000.0 | 2.0 | 2.0 |
Last rows
| Unnamed: 0 | source | postcode | house_is | property_subtype | price | rooms_number | area | equipped_kitchen_has | furnished | open_fire | terrace | terrace_area | garden | garden_area | land_surface | facades_number | swimming_pool_has | region | building_state_agg | postcode_median_price | building_state_median_price | property_subtype_median_price | building_property_subtype_median_facades | property_subtype_median_facades | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10043 | 11277 | 4 | 9690 | False | APARTMENT | 315000.0 | 3.0 | 192.0 | True | False | False | True | 48.0 | False | 0.0 | 0.0 | 3.0 | False | F | renovated | 299000.0 | 310000.0 | 282500.0 | 2.0 | 2.0 |
| 10044 | 11278 | 4 | 8300 | False | APARTMENT | 490000.0 | 2.0 | 91.0 | True | False | False | False | 0.0 | False | 0.0 | 0.0 | 2.0 | False | F | good | 649000.0 | 320000.0 | 282500.0 | 2.0 | 2.0 |
| 10045 | 11279 | 4 | 8800 | False | APARTMENT | 265000.0 | 3.0 | 138.0 | True | False | False | False | 0.0 | False | 0.0 | 0.0 | 2.0 | False | F | good | 240000.0 | 320000.0 | 282500.0 | 2.0 | 2.0 |
| 10046 | 11280 | 4 | 6000 | False | APARTMENT | 99000.0 | 2.0 | 91.0 | True | False | False | False | 0.0 | False | 0.0 | 0.0 | 2.0 | False | W | to_renovate | 154500.0 | 230000.0 | 282500.0 | 2.0 | 2.0 |
| 10047 | 11281 | 4 | 2950 | False | LOFT | 410000.0 | 3.0 | 150.0 | True | False | False | True | 41.0 | False | 0.0 | 0.0 | 3.0 | False | F | good | 343500.0 | 320000.0 | 422000.0 | 3.0 | 3.0 |
| 10048 | 11282 | 4 | 4000 | False | APARTMENT | 245000.0 | 2.0 | 103.0 | False | False | False | True | 5.0 | False | 0.0 | 0.0 | 2.0 | False | W | good | 225000.0 | 320000.0 | 282500.0 | 2.0 | 2.0 |
| 10049 | 11283 | 4 | 8790 | False | APARTMENT | 250000.0 | 1.0 | 300.0 | False | False | False | False | 0.0 | False | 0.0 | 0.0 | 2.0 | False | F | good | 257000.0 | 320000.0 | 282500.0 | 2.0 | 2.0 |
| 10050 | 11284 | 4 | 2018 | False | APARTMENT | 298000.0 | 1.0 | 71.0 | True | False | False | True | 12.0 | False | 0.0 | 0.0 | 1.0 | False | F | good | 443475.0 | 320000.0 | 282500.0 | 2.0 | 2.0 |
| 10051 | 11286 | 4 | 2000 | False | FLAT_STUDIO | 150000.0 | 1.0 | 40.0 | True | False | False | False | 0.0 | False | 0.0 | 0.0 | 2.0 | False | F | to_renovate | 497000.0 | 230000.0 | 149000.0 | 2.0 | 2.0 |
| 10052 | 11287 | 4 | 2060 | False | APARTMENT | 228009.0 | 2.0 | 80.0 | True | False | False | False | 0.0 | False | 0.0 | 0.0 | 2.0 | False | F | good | 299000.0 | 320000.0 | 282500.0 | 2.0 | 2.0 |